Out-of-distribution (OOD) detection has attracted a large amount of attention from the machine learning research community in recent years due to its importance in deployed systems. Most of the previous studies focused on the detection of OOD samples in the multi-class classification task. However, OOD detection in the multi-label classification task remains an underexplored domain. In this research, we propose YolOOD - a method that utilizes concepts from the object detection domain to perform OOD detection in the multi-label classification task. Object detection models have an inherent ability to distinguish between objects of interest (in-distribution) and irrelevant objects (e.g., OOD objects) on images that contain multiple objects from different categories. These abilities allow us to convert a regular object detection model into an image classifier with inherent OOD detection capabilities with just minor changes. We compare our approach to state-of-the-art OOD detection methods and demonstrate YolOOD's ability to outperform these methods on a comprehensive suite of in-distribution and OOD benchmark datasets.
translated by 谷歌翻译
对抗性示例可用于恶意和秘密地改变模型的预测。众所周知,为一个模型设计的对抗示例也可以传输到其他模型。这构成了主要威胁,因为这意味着攻击者可以以黑框方式对准系统。在可转让性的领域,研究人员提出了使攻击更加可转移的方法,并使模型更强大,以转移的示例。但是,据我们所知,尚无作品提出一种在黑盒攻击者的角度对对抗性示例的转移性进行排名的方法。这是一项重要的任务,因为攻击者可能只使用一组选定的示例,因此需要选择最有可能传输的样本。在本文中,我们建议一种方法来排名在不访问受害者模型的情况下对对抗性示例的可传递性。为此,我们定义并估算了有关受害者信息有限的样本的预期可传递性。我们还探讨了实用的方案:对手可以选择要攻击的最佳样本以及对手必须使用特定样本,但可以选择不同的扰动。通过我们的实验,我们发现我们的排名方法可以将攻击者的成功率提高高达80%(而无需排名)。
translated by 谷歌翻译
虽然视觉和语言模型在视觉问题回答等任务上表现良好,但在基本的人类常识性推理技能方面,它们会挣扎。在这项工作中,我们介绍了Winogavil:在线游戏,以收集视觉和语言协会(例如,狼人到满月),用作评估最先进模型的动态基准。受欢迎的纸牌游戏代号的启发,Spymaster提供了与几个视觉候选者相关的文本提示,另一个玩家必须识别它们。人类玩家因创建对竞争对手AI模型而具有挑战性的联想而获得了回报,但仍然可以由其他人类玩家解决。我们使用游戏来收集3.5k实例,发现它们对人类的直观(> 90%的Jaccard索引),但对最先进的AI模型充满挑战,其中最佳模型(Vilt)的得分为52% ,成功的位置在视觉上是显着的。我们的分析以及我们从玩家那里收集的反馈表明,收集的关联需要多种推理技能,包括一般知识,常识,抽象等。我们发布数据集,代码和交互式游戏,旨在允许未来的数据收集,可用于开发具有更好关联能力的模型。
translated by 谷歌翻译
基于深度学习的面部识别(FR)模型在过去几年中表现出最先进的性能,即使在佩戴防护医疗面罩时,面膜在Covid-19大流行期间变得普遍。鉴于这些模型的出色表现,机器学习研究界已经表明对挑战其稳健性越来越令人兴趣。最初,研究人员在数字域中呈现了对抗性攻击,后来将攻击转移到物理领域。然而,在许多情况下,物理领域的攻击是显眼的,例如,需要在脸上放置贴纸,因此可能会在真实环境中引起怀疑(例如,机场)。在本文中,我们提出了对伪装在面部面罩的最先进的FR模型的身体对抗性掩模,以仔细制作的图案的形式施加在面部面具上。在我们的实验中,我们检查了我们的对抗掩码对广泛的FR模型架构和数据集的可转移性。此外,我们通过在织物医疗面罩上印刷对抗性模式来验证了我们的对抗性面膜效果,使FR系统仅识别穿面膜的3.34%的参与者(相比最低83.34%其他评估的面具)。
translated by 谷歌翻译
由于表现出不公平行为,基于深度学习的面部识别系统经历了增加的媒体关注。大型企业,如IBM,后果关闭了他们的面部识别和年龄预测系统。年龄预测是一个特别困难的应用程序,其公平仍然存在开放的研究问题(例如,预测不同种族的年龄同样准确)。年龄预测方法中不公平行为的主要原因之一在于培训数据的分配和多样性。在这项工作中,我们提出了两种用于数据集策策和数据增强的新方法,以通过平衡特征策策来提高公平,并通过分布意识增强增加多样性。为此,我们向面部识别域引入分发检测,用于选择与年龄,种族和性别之间的数据之间与深度神经网络(DNN)任务最相关的数据。我们的方法显示了有希望的结果。我们经过最佳训练的DNN模型在公平程度上表现优于4.92倍,并提高了DNN概括了亚马逊AWS和微软澳大利亚公共云系统的能力,分别将占据了31.88%和10.95%。
translated by 谷歌翻译
In this paper, we formulate the problem of predicting a geolocation from free text as a sequence-to-sequence problem. Using this formulation, we obtain a geocoding model by training a T5 encoder-decoder transformer model using free text as an input and geolocation as an output. The geocoding model was trained on geo-tagged wikidump data with adaptive cell partitioning for the geolocation representation. All of the code including Rest-based application, dataset and model checkpoints used in this work are publicly available.
translated by 谷歌翻译
Real-life tools for decision-making in many critical domains are based on ranking results. With the increasing awareness of algorithmic fairness, recent works have presented measures for fairness in ranking. Many of those definitions consider the representation of different ``protected groups'', in the top-$k$ ranked items, for any reasonable $k$. Given the protected groups, confirming algorithmic fairness is a simple task. However, the groups' definitions may be unknown in advance. In this paper, we study the problem of detecting groups with biased representation in the top-$k$ ranked items, eliminating the need to pre-define protected groups. The number of such groups possible can be exponential, making the problem hard. We propose efficient search algorithms for two different fairness measures: global representation bounds, and proportional representation. Then we propose a method to explain the bias in the representations of groups utilizing the notion of Shapley values. We conclude with an experimental study, showing the scalability of our approach and demonstrating the usefulness of the proposed algorithms.
translated by 谷歌翻译
Neural volumetric representations have become a widely adopted model for radiance fields in 3D scenes. These representations are fully implicit or hybrid function approximators of the instantaneous volumetric radiance in a scene, which are typically learned from multi-view captures of the scene. We investigate the new task of neural volume super-resolution - rendering high-resolution views corresponding to a scene captured at low resolution. To this end, we propose a neural super-resolution network that operates directly on the volumetric representation of the scene. This approach allows us to exploit an advantage of operating in the volumetric domain, namely the ability to guarantee consistent super-resolution across different viewing directions. To realize our method, we devise a novel 3D representation that hinges on multiple 2D feature planes. This allows us to super-resolve the 3D scene representation by applying 2D convolutional networks on the 2D feature planes. We validate the proposed method's capability of super-resolving multi-view consistent views both quantitatively and qualitatively on a diverse set of unseen 3D scenes, demonstrating a significant advantage over existing approaches.
translated by 谷歌翻译
We introduce MuJoCo MPC (MJPC), an open-source, interactive application and software framework for real-time predictive control, based on MuJoCo physics. MJPC allows the user to easily author and solve complex robotics tasks, and currently supports three shooting-based planners: derivative-based iLQG and Gradient Descent, and a simple derivative-free method we call Predictive Sampling. Predictive Sampling was designed as an elementary baseline, mostly for its pedagogical value, but turned out to be surprisingly competitive with the more established algorithms. This work does not present algorithmic advances, and instead, prioritises performant algorithms, simple code, and accessibility of model-based methods via intuitive and interactive software. MJPC is available at: github.com/deepmind/mujoco_mpc, a video summary can be viewed at: dpmd.ai/mjpc.
translated by 谷歌翻译
In this paper, we present a method for converting a given scene image into a sketch using different types and multiple levels of abstraction. We distinguish between two types of abstraction. The first considers the fidelity of the sketch, varying its representation from a more precise portrayal of the input to a looser depiction. The second is defined by the visual simplicity of the sketch, moving from a detailed depiction to a sparse sketch. Using an explicit disentanglement into two abstraction axes -- and multiple levels for each one -- provides users additional control over selecting the desired sketch based on their personal goals and preferences. To form a sketch at a given level of fidelity and simplification, we train two MLP networks. The first network learns the desired placement of strokes, while the second network learns to gradually remove strokes from the sketch without harming its recognizability and semantics. Our approach is able to generate sketches of complex scenes including those with complex backgrounds (e.g., natural and urban settings) and subjects (e.g., animals and people) while depicting gradual abstractions of the input scene in terms of fidelity and simplicity.
translated by 谷歌翻译